{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Bootstrap Examples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_This setup code is required to run in an IPython notebook_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import seaborn\n",
    "\n",
    "seaborn.set_style(\"darkgrid\")\n",
    "plt.rc(\"figure\", figsize=(16, 6))\n",
    "plt.rc(\"savefig\", dpi=90)\n",
    "plt.rc(\"font\", family=\"sans-serif\")\n",
    "plt.rc(\"font\", size=14)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sharpe Ratio"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Sharpe Ratio is an important measure of return per unit of risk.  The example shows how to estimate the variance of the Sharpe Ratio and how to construct confidence intervals for the Sharpe Ratio using a long series of U.S. equity data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "import arch.data.frenchdata\n",
    "\n",
    "ff = arch.data.frenchdata.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data set contains the Fama-French factors, including the excess market return."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "excess_market = ff.iloc[:, 0]\n",
    "print(ff.describe())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next step is to construct a function that computes the Sharpe Ratio.  This function also return the annualized mean and annualized standard deviation which will allow the covariance matrix of these parameters to be estimated using the bootstrap."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sharpe_ratio(x):\n",
    "    mu, sigma = 12 * x.mean(), np.sqrt(12 * x.var())\n",
    "    values = np.array([mu, sigma, mu / sigma]).squeeze()\n",
    "    index = [\"mu\", \"sigma\", \"SR\"]\n",
    "    return pd.Series(values, index=index)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The function can be called directly on the data to show full sample estimates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "params = sharpe_ratio(excess_market)\n",
    "params"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reproducibility\n",
    "\n",
    "All bootstraps accept the keyword argument `seed` which can contain a NumPy `Generator` or `RandomState` or an `int`.  When using an `int`, the argument is passed `np.random.default_rng` to create the core generator. This allows the same pseudo random values to be used across multiple runs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### _Warning_\n",
    "\n",
    "_The bootstrap chosen must be appropriate for the data.  Squared returns are serially correlated, and so a time-series bootstrap is required._"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Bootstraps are initialized with any bootstrap specific parameters and the data to be used in the bootstrap.  Here the `12` is the average window length in the Stationary Bootstrap, and the next input is the data to be bootstrapped."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from arch.bootstrap import StationaryBootstrap\n",
    "\n",
    "# Initialize with entropy from random.org\n",
    "entropy = [877788388, 418255226, 989657335, 69307515]\n",
    "seed = np.random.default_rng(entropy)\n",
    "\n",
    "bs = StationaryBootstrap(12, excess_market, seed=seed)\n",
    "results = bs.apply(sharpe_ratio, 2500)\n",
    "SR = pd.DataFrame(results[:, -1:], columns=[\"SR\"])\n",
    "fig = SR.hist(bins=40)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cov = bs.cov(sharpe_ratio, 1000)\n",
    "cov = pd.DataFrame(cov, index=params.index, columns=params.index)\n",
    "print(cov)\n",
    "se = pd.Series(np.sqrt(np.diag(cov)), index=params.index)\n",
    "se.name = \"Std Errors\"\n",
    "print(\"\\n\")\n",
    "print(se)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ci = bs.conf_int(sharpe_ratio, 1000, method=\"basic\")\n",
    "ci = pd.DataFrame(ci, index=[\"Lower\", \"Upper\"], columns=params.index)\n",
    "print(ci)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternative confidence intervals can be computed using a variety of methods.  Setting `reuse=True` allows the previous bootstrap results to be used when constructing confidence intervals using alternative methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ci = bs.conf_int(sharpe_ratio, 1000, method=\"percentile\", reuse=True)\n",
    "ci = pd.DataFrame(ci, index=[\"Lower\", \"Upper\"], columns=params.index)\n",
    "print(ci)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Optimal Block Length Estimation\n",
    "\n",
    "The function `optimal_block_length` can be used to estimate the optimal block lengths for the Stationary and Circular bootstraps. Here we use the squared market return since the Sharpe ratio depends on the mean and the variance, and the autocorrelation in the squares is stronger than in the returns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from arch.bootstrap import optimal_block_length\n",
    "\n",
    "opt = optimal_block_length(excess_market**2)\n",
    "print(opt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can repeat the analysis above using the estimated optimal block length.  Here we see that the extremes appear to be slightly more extreme."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Reinitialize using the same entropy\n",
    "rs = np.random.default_rng(entropy)\n",
    "\n",
    "bs = StationaryBootstrap(opt.loc[\"Mkt-RF\", \"stationary\"], excess_market, seed=seed)\n",
    "results = bs.apply(sharpe_ratio, 2500)\n",
    "SR = pd.DataFrame(results[:, -1:], columns=[\"SR\"])\n",
    "fig = SR.hist(bins=40)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Probit (statsmodels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The second example makes use of a Probit model from statsmodels.  The demo data is university admissions data which contains a binary variable for being admitted, GRE score, GPA score and quartile rank. This data is downloaded from the internet and imported using pandas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import arch.data.binary\n",
    "\n",
    "binary = arch.data.binary.load()\n",
    "binary = binary.dropna()\n",
    "print(binary.describe())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fitting the model directly"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first steps are to build the regressor and the dependent variable arrays.  Then, using these arrays, the model can be estimated by calling `fit`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import statsmodels.api as sm\n",
    "\n",
    "endog = binary[[\"admit\"]]\n",
    "exog = binary[[\"gre\", \"gpa\"]]\n",
    "const = pd.Series(np.ones(exog.shape[0]), index=endog.index)\n",
    "const.name = \"Const\"\n",
    "exog = pd.DataFrame([const, exog.gre, exog.gpa]).T\n",
    "\n",
    "# Estimate the model\n",
    "mod = sm.Probit(endog, exog)\n",
    "fit = mod.fit(disp=0)\n",
    "params = fit.params\n",
    "print(params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The wrapper function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Most models in statsmodels are implemented as classes, require an explicit call to `fit` and return a class containing parameter estimates and other quantities.  These classes cannot be directly used with the bootstrap methods.  However, a simple wrapper can be written that takes the data as the only inputs and returns parameters estimated using a Statsmodel model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def probit_wrap(endog, exog):\n",
    "    return sm.Probit(endog, exog).fit(disp=0).params"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A call to this function should return the same parameter values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "probit_wrap(endog, exog)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The wrapper can be directly used to estimate the parameter covariance or to construct confidence intervals."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from arch.bootstrap import IIDBootstrap\n",
    "\n",
    "bs = IIDBootstrap(endog=endog, exog=exog)\n",
    "cov = bs.cov(probit_wrap, 1000)\n",
    "cov = pd.DataFrame(cov, index=exog.columns, columns=exog.columns)\n",
    "print(cov)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "se = pd.Series(np.sqrt(np.diag(cov)), index=exog.columns)\n",
    "print(se)\n",
    "print(\"T-stats\")\n",
    "print(params / se)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ci = bs.conf_int(probit_wrap, 1000, method=\"basic\")\n",
    "ci = pd.DataFrame(ci, index=[\"Lower\", \"Upper\"], columns=exog.columns)\n",
    "print(ci)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Speeding things up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Starting values can be provided to `fit` which can save time finding starting values.  Since the bootstrap parameter estimates should be close to the original sample estimates, the full sample estimated parameters are reasonable starting values.  These can be passed using the `extra_kwargs` dictionary to a modified wrapper that will accept a keyword argument containing starting values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def probit_wrap_start_params(endog, exog, start_params=None):\n",
    "    return sm.Probit(endog, exog).fit(start_params=start_params, disp=0).params"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bs.reset()  # Reset to original state for comparability\n",
    "cov = bs.cov(\n",
    "    probit_wrap_start_params, 1000, extra_kwargs={\"start_params\": params.values}\n",
    ")\n",
    "cov = pd.DataFrame(cov, index=exog.columns, columns=exog.columns)\n",
    "print(cov)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Bootstrapping Uneven Length Samples\n",
    "Independent samples of uneven length are common in experiment settings, e.g., A/B testing of a website.  The `IIDBootstrap` allows for arbitrary dependence within an observation index and so cannot be naturally applied to these data sets. The `IndependentSamplesBootstrap` allows datasets with variables of different lengths to be sampled by exploiting the independence of the values to separately bootstrap each component. Below is an example showing how a confidence interval can be constructed for the difference in means of two groups. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from arch.bootstrap import IndependentSamplesBootstrap\n",
    "\n",
    "\n",
    "def mean_diff(x, y):\n",
    "    return x.mean() - y.mean()\n",
    "\n",
    "\n",
    "rs = np.random.RandomState(0)\n",
    "treatment = 0.2 + rs.standard_normal(200)\n",
    "control = rs.standard_normal(800)\n",
    "\n",
    "bs = IndependentSamplesBootstrap(treatment, control, seed=seed)\n",
    "print(bs.conf_int(mean_diff, method=\"studentized\"))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
